The search functionality is under construction.

Keyword Search Result

[Keyword] deep learning(149hit)

121-140hit(149hit)

  • Image Denoiser Using Convolutional Neural Network with Deconvolution and Modified Residual Network

    Soo-Yeon SHIN  Dong-Myung KIM  Jae-Won SUH  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2019/05/14
      Vol:
    E102-D No:8
      Page(s):
    1598-1601

    Due to improvements in hardware and software performance, deep learning algorithms have been used in many areas and have shown good results. In this paper, we propose a noise reduction framework based on a convolutional neural network (CNN) with deconvolution and a modified residual network (ResNet) to remove image noise. Simulation results show that the proposed algorithm is superior to the conventional noise eliminator in subjective and objective performance analyses.

  • Webly-Supervised Food Detection with Foodness Proposal Open Access

    Wataru SHIMODA  Keiji YANAI  

     
    PAPER

      Pubricized:
    2019/04/25
      Vol:
    E102-D No:7
      Page(s):
    1230-1239

    To minimize the annotation costs associated with training semantic segmentation models and object detection models, weakly supervised detection and weakly supervised segmentation approaches have been extensively studied. However most of these approaches assume that the domain between training and testing is the same, which at times results in considerable performance drops. For example, if we train an object detection network using only web images showing a large object at the center, it can be difficult for the network to detect multiple small objects. In this paper, we focus on training a CNN with only web images and achieve object detection in the wild. A proposal-based approach can address the problem associated with differences in domains because web images are similar to images of the proposal. In both domains, the target object is located at the center of the image and the ratio of the size of the target object to the size of the image is large. Several proposal methods have been proposed to detect regions with high “object-ness.” However, many of these proposals generate a large number of candidates to increase the recall rate. Considering the recent advent of deep CNNs, methods that generate a large number of proposals exhibit problems in terms of processing time for practical use. Therefore, we propose a CNN-based “food-ness” proposal method in this paper that requires neither pixel-wise annotation nor bounding box annotation. Our method generates proposals through backpropagation and most of these proposals focus only on food objects. In addition, we can easily control the number of proposals. Through experiments, we trained a network model using only web images and tested the model on the UEC FOOD 100 dataset. We demonstrate that the proposed method achieves high performance compared to traditional proposal methods in terms of the trade-off between accuracy and computational cost. Therefore, in this paper, we propose an intermediate approach between the traditional proposal approach and the fully convolutional approach. In particular, we propose a novel proposal method that generates high“food-ness” regions using fully convolutional networks based on the backward approach by training food images gathered from the web.

  • GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers

    Hiroki NAKAHARA  Haruyoshi YONEKAWA  Tomoya FUJII  Masayuki SHIMODA  Shimpei SATO  

     
    PAPER-Design Tools

      Pubricized:
    2019/02/27
      Vol:
    E102-D No:5
      Page(s):
    1003-1011

    The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.

  • RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks

    Cheng LUO  Wei CAO  Lingli WANG  Philip H. W. LEONG  

     
    PAPER-Applications

      Pubricized:
    2019/02/19
      Vol:
    E102-D No:5
      Page(s):
    1037-1045

    With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.

  • A Highly Accurate Transportation Mode Recognition Using Mobile Communication Quality

    Wataru KAWAKAMI  Kenji KANAI  Bo WEI  Jiro KATTO  

     
    PAPER

      Pubricized:
    2018/10/15
      Vol:
    E102-B No:4
      Page(s):
    741-750

    To recognize transportation modes without any additional sensor devices, we demonstrate that the transportation modes can be recognized from communication quality factors. In the demonstration, instead of using global positioning system (GPS) and accelerometer sensors, we collect mobile TCP throughputs, received-signal strength indicators (RSSIs), and cellular base-station IDs (Cell IDs) through in-line network measurement when the user enjoys mobile services, such as video streaming. In accuracy evaluations, we conduct two different field experiments to collect the data in six typical transportation modes (static, walking, riding a bicycle, riding a bus, riding a train and riding a subway), and then construct the classifiers by applying a support-vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and convolutional neural network (CNN). Our results show that these transportation modes can be recognized with high accuracy by using communication quality factors as well as the use of accelerometer sensors.

  • Feature Selection of Deep Learning Models for EEG-Based RSVP Target Detection Open Access

    Jingxia CHEN  Zijing MAO  Ru ZHENG  Yufei HUANG  Lifeng HE  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/01/22
      Vol:
    E102-D No:4
      Page(s):
    836-844

    Most recent work used raw electroencephalograph (EEG) data to train deep learning (DL) models, with the assumption that DL models can learn discriminative features by itself. It is not yet clear what kind of RSVP specific features can be selected and combined with EEG raw data to improve the RSVP classification performance of DL models. In this paper, we tried to extract RSVP specific features and combined them with EEG raw data to capture more spatial and temporal correlations of target or non-target event and improve the EEG-based RSVP target detection performance. We tested on X2 Expertise RSVP dataset to show the experiment results. We conducted detailed performance evaluations among different features and feature combinations with traditional classification models and different CNN models for within-subject and cross-subject test. Compared with state-of-the-art traditional Bagging Tree (BT) and Bayesian Linear Discriminant Analysis (BLDA) classifiers, our proposed combined features with CNN models achieved 1.1% better performance in within-subject test and 2% better performance in cross-subject test. This shed light on the ability for the combined features to be an efficient tool in RSVP target detection with deep learning models and thus improved the performance of RSVP target detection.

  • Faster-ADNet for Visual Tracking

    Tiansa ZHANG  Chunlei HUO  Zhiqiang ZHOU  Bo WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/12/12
      Vol:
    E102-D No:3
      Page(s):
    684-687

    By taking advantages of deep learning and reinforcement learning, ADNet (Action Decision Network) outperforms other approaches. However, its speed and performance are still limited by factors such as unreliable confidence score estimation and redundant historical actions. To address the above limitations, a faster and more accurate approach named Faster-ADNet is proposed in this paper. By optimizing the tracking process via a status re-identification network, the proposed approach is more efficient and 6 times faster than ADNet. At the same time, the accuracy and stability are enhanced by historical actions removal. Experiments demonstrate the advantages of Faster-ADNet.

  • Rectifying Transformation Networks for Transformation-Invariant Representations with Power Law

    Chunxiao FAN  Yang LI  Lei TIAN  Yong LI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/12/04
      Vol:
    E102-D No:3
      Page(s):
    675-679

    This letter proposes a representation learning framework of convolutional neural networks (Convnets) that aims to rectify and improve the feature representations learned by existing transformation-invariant methods. The existing methods usually encode feature representations invariant to a wide range of spatial transformations by augmenting input images or transforming intermediate layers. Unfortunately, simply transforming the intermediate feature maps may lead to unpredictable representations that are ineffective in describing the transformed features of the inputs. The reason is that the operations of convolution and geometric transformation are not exchangeable in most cases and so exchanging the two operations will yield the transformation error. The error may potentially harm the performance of the classification networks. Motivated by the fractal statistics of natural images, this letter proposes a rectifying transformation operator to minimize the error. The proposed method is differentiable and can be inserted into the convolutional architecture without making any modification to the optimization algorithm. We show that the rectified feature representations result in better classification performance on two benchmarks.

  • Personal Data Retrieval and Disambiguation in Web Person Search

    Yuliang WEI  Guodong XIN  Wei WANG  Fang LV  Bailing WANG  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2018/10/24
      Vol:
    E102-D No:2
      Page(s):
    392-395

    Web person search often return web pages related to several distinct namesakes. This paper proposes a new web page model for template-free person data extraction, and uses Dirichlet Process Mixture model to solve name disambiguation. The results show that our method works best on web pages with complex structure.

  • Security Consideration for Deep Learning-Based Image Forensics

    Wei ZHAO  Pengpeng YANG  Rongrong NI  Yao ZHAO  Haorui WU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/08/24
      Vol:
    E101-D No:12
      Page(s):
    3263-3266

    Recently, image forensics community has paid attention to the research on the design of effective algorithms based on deep learning technique. And facts proved that combining the domain knowledge of image forensics and deep learning would achieve more robust and better performance than the traditional schemes. Instead of improving algorithm performance, in this paper, the safety of deep learning based methods in the field of image forensics is taken into account. To the best of our knowledge, this is the first work focusing on this topic. Specifically, we experimentally find that the method using deep learning would fail when adding the slight noise into the images (adversarial images). Furthermore, two kinds of strategies are proposed to enforce security of deep learning-based methods. Firstly, a penalty term to the loss function is added, which is the 2-norm of the gradient of the loss with respect to the input images, and then an novel training method is adopt to train the model by fusing the normal and adversarial images. Experimental results show that the proposed algorithm can achieve good performance even in the case of adversarial images and provide a security consideration for deep learning-based image forensics.

  • Sparse Graph Based Deep Learning Networks for Face Recognition

    Renjie WU  Sei-ichiro KAMATA  

     
    PAPER

      Pubricized:
    2018/06/20
      Vol:
    E101-D No:9
      Page(s):
    2209-2219

    In recent years, deep learning based approaches have substantially improved the performance of face recognition. Most existing deep learning techniques work well, but neglect effective utilization of face correlation information. The resulting performance loss is noteworthy for personal appearance variations caused by factors such as illumination, pose, occlusion, and misalignment. We believe that face correlation information should be introduced to solve this network performance problem originating from by intra-personal variations. Recently, graph deep learning approaches have emerged for representing structured graph data. A graph is a powerful tool for representing complex information of the face image. In this paper, we survey the recent research related to the graph structure of Convolutional Neural Networks and try to devise a definition of graph structure included in Compressed Sensing and Deep Learning. This paper devoted to the story explain of two properties of our graph - sparse and depth. Sparse can be advantageous since features are more likely to be linearly separable and they are more robust. The depth means that this is a multi-resolution multi-channel learning process. We think that sparse graph based deep neural network can more effectively make similar objects to attract each other, the relative, different objects mutually exclusive, similar to a better sparse multi-resolution clustering. Based on this concept, we propose a sparse graph representation based on the face correlation information that is embedded via the sparse reconstruction and deep learning within an irregular domain. The resulting classification is remarkably robust. The proposed method achieves high recognition rates of 99.61% (94.67%) on the benchmark LFW (YTF) facial evaluation database.

  • A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

    Hang CUI  Shoichi HIRASAWA  Hiroaki KOBAYASHI  Hiroyuki TAKIZAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/13
      Vol:
    E101-D No:9
      Page(s):
    2307-2314

    Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. Because of the importance, many different implementations have been proposed to accelerate this computational kernel. The performance characteristics of those SpMV implementations are quite different, and it is basically difficult to select the implementation that has the best performance for a given sparse matrix without performance profiling. One existing approach to the SpMV best-code selection problem is by using manually-predefined features and a machine learning model for the selection. However, it is generally hard to manually define features that can perfectly express the characteristics of the original sparse matrix necessary for the code selection. Besides, some information loss would happen by using this approach. This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix. Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best performance, in advance of the execution. The benefits of using the proposed mechanism are discussed by calculating the prediction accuracy and the performance. According to the evaluation, the proposed mechanism can select an optimal or suboptimal implementation for an unseen sparse matrix in the test data set in most cases. These results demonstrate that, by using deep learning, a whole sparse matrix can be used to do the best implementation prediction, and the prediction accuracy achieved by the proposed mechanism is higher than that of using predefined features.

  • Transform Electric Power Curve into Dynamometer Diagram Image Using Deep Recurrent Neural Network

    Junfeng SHI  Wenming MA  Peng SONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/09
      Vol:
    E101-D No:8
      Page(s):
    2154-2158

    To learn the working situation of rod-pumped wells under ground, we always need to analyze dynamometer diagrams, which are generated by the load sensor and displacement sensor. Rod-pumped wells are usually located in the places with extreme weather, and these sensors are installed on some special oil equipments in the open air. As time goes by, sensors are prone to generating unstable and incorrect data. Unfortunately, load sensors are too expensive to frequently reinstall. Therefore, the resulting dynamometer diagrams sometimes cannot make an accurate diagnosis. Instead, as an absolutely necessary equipment of the rod-pumped well, the electric motor has much longer life and cannot be easily impacted by the weather. The electric power curve during a swabbing period can also reflect the working situation under ground, but is much harder to explain than the dynamometer diagram. This letter presented a novel deep learning architecture, which can transform the electric power curve into the dimensionless dynamometer diagram image. We conduct our experiments on a real-world dataset, and the results show that our method can get an impressive transformation accuracy.

  • Performance Evaluation of Pipeline-Based Processing for the Caffe Deep Learning Framework

    Ayae ICHINOSE  Atsuko TAKEFUSA  Hidemoto NAKADA  Masato OGUCHI  

     
    PAPER

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1042-1052

    Many life-log analysis applications, which transfer data from cameras and sensors to a Cloud and analyze them in the Cloud, have been developed as the use of various sensors and Cloud computing technologies has spread. However, difficulties arise because of the limited network bandwidth between such sensors and the Cloud. In addition, sending raw sensor data to a Cloud may introduce privacy issues. Therefore, we propose a pipelined method for distributed deep learning processing between sensors and the Cloud to reduce the amount of data sent to the Cloud and protect the privacy of users. In this study, we measured the processing times and evaluated the performance of our method using two different datasets. In addition, we performed experiments using three types of machines with different performance characteristics on the client side and compared the processing times. The experimental results show that the accuracy of deep learning with coarse-grained data is comparable to that achieved with the default parameter settings, and the proposed distributed processing method has performance advantages in cases of insufficient network bandwidth between realistic sensors and a Cloud environment. In addition, it is confirmed that the process that most affects the overall processing time varies depending on the machine performance on the client side, and the most efficient distribution method similarly differs.

  • Sequential Convolutional Residual Network for Image Recognition

    Wonjun HWANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1213-1216

    In this letter, we propose a sequential convolutional residual network, where we first analyze a tangled network architecture using simplified equations and determine the critical point to untangle the complex network architecture. Although the residual network shows good performance, the learning efficiency is not better than expected at deeper layers because the network is excessively intertwined. To solve this problem, we propose a network in which the information is transmitted sequentially. In this network architecture, the neighboring layer output adds the input of the current layer and iteratively passes its result to the next sequential layer. Thus, the proposed network can improve the learning efficiency and performance by successfully mitigating the complexity in deep networks. We show that the proposed network performs well on the Cifar-10 and Cifar-100 datasets. In particular, we prove that the proposed method is superior to the baseline method as the depth increases.

  • Stock Price Prediction by Deep Neural Generative Model of News Articles

    Takashi MATSUBARA  Ryo AKITA  Kuniaki UEHARA  

     
    PAPER-Datamining Technologies

      Pubricized:
    2018/01/19
      Vol:
    E101-D No:4
      Page(s):
    901-908

    In this study, we propose a deep neural generative model for predicting daily stock price movements given news articles. Approaches involving conventional technical analysis have been investigated to identify certain patterns in past price movements, which in turn helps to predict future price movements. However, the financial market is highly sensitive to specific events, including corporate buyouts, product releases, and the like. Therefore, recent research has focused on modeling relationships between these events that appear in the news articles and future price movements; however, a very large number of news articles are published daily, each article containing rich information, which results in overfitting to past price movements used for parameter adjustment. Given the above, we propose a model based on a generative model of news articles that includes price movement as a condition, thereby avoiding excessive overfitting thanks to the nature of the generative model. We evaluate our proposed model using historical price movements of Nikkei 225 and Standard & Poor's 500 Stock Index, confirming that our model predicts future price movements better than such conventional classifiers as support vector machines and multilayer perceptrons. Further, our proposed model extracts significant words from news articles that are directly related to future stock price movements.

  • Deep Relational Model: A Joint Probabilistic Model with a Hierarchical Structure for Bidirectional Estimation of Image and Labels

    Toru NAKASHIKA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/10/25
      Vol:
    E101-D No:2
      Page(s):
    428-436

    Two different types of representations, such as an image and its manually-assigned corresponding labels, generally have complex and strong relationships to each other. In this paper, we represent such deep relationships between two different types of visible variables using an energy-based probabilistic model, called a deep relational model (DRM) to improve the prediction accuracies. A DRM stacks several layers from one visible layer on to another visible layer, sandwiching several hidden layers between them. As with restricted Boltzmann machines (RBMs) and deep Boltzmann machines (DBMs), all connections (weights) between two adjacent layers are undirected. During maximum likelihood (ML) -based training, the network attempts to capture the latent complex relationships between two visible variables with its deep architecture. Unlike deep neural networks (DNNs), 1) the DRM is a totally generative model and 2) allows us to generate one visible variables given the other, and 2) the parameters can be optimized in a probabilistic manner. The DRM can be also fine-tuned using DNNs, like deep belief nets (DBNs) or DBMs pre-training. This paper presents experiments conduced to evaluate the performance of a DRM in image recognition and generation tasks using the MNIST data set. In the image recognition experiments, we observed that the DRM outperformed DNNs even without fine-tuning. In the image generation experiments, we obtained much more realistic images generated from the DRM more than those from the other generative models.

  • A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

    Tomoya FUJII  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER-Emerging Applications

      Pubricized:
    2017/11/17
      Vol:
    E101-D No:2
      Page(s):
    376-386

    For a pre-trained deep convolutional neural network (CNN) for an embedded system, a high-speed and a low power consumption are required. In the former of the CNN, it consists of convolutional layers, while in the latter, it consists of fully connection layers. In the convolutional layer, the multiply accumulation operation is a bottleneck, while the fully connection layer, the memory access is a bottleneck. The binarized CNN has been proposed to realize many multiply accumulation circuit on the FPGA, thus, the convolutional layer can be done with a high-seed operation. However, even if we apply the binarization to the fully connection layer, the amount of memory was still a bottleneck. In this paper, we propose a neuron pruning technique which eliminates almost part of the weight memory, and we apply it to the fully connection layer on the binarized CNN. In that case, since the weight memory is realized by an on-chip memory on the FPGA, it achieves a high-speed memory access. To further reduce the memory size, we apply the retraining the CNN after neuron pruning. In this paper, we propose a sequential-input parallel-output fully connection layer circuit for the binarized fully connection layer, while proposing a streaming circuit for the binarized 2D convolutional layer. The experimental results showed that, by the neuron pruning, as for the fully connected layer on the VGG-11 CNN, the number of neurons was reduced by 39.8% with keeping the 99% baseline accuracy. We implemented the neuron pruning CNN on the Xilinx Inc. Zynq Zedboard. Compared with the ARM Cortex-A57, it was 1773.0 times faster, it dissipated 3.1 times lower power, and its performance per power efficiency was 5781.3 times better. Also, compared with the Maxwell GPU, it was 11.1 times faster, it dissipated 7.7 times lower power, and its performance per power efficiency was 84.1 times better. Thus, the binarized CNN on the FPGA is suitable for the embedded system.

  • Deep Learning-Based Fault Localization with Contextual Information

    Zhuo ZHANG  Yan LEI  Qingping TAN  Xiaoguang MAO  Ping ZENG  Xi CHANG  

     
    LETTER-Software Engineering

      Pubricized:
    2017/09/08
      Vol:
    E100-D No:12
      Page(s):
    3027-3031

    Fault localization is essential for solving the issue of software faults. Aiming at improving fault localization, this paper proposes a deep learning-based fault localization with contextual information. Specifically, our approach uses deep neural network to construct a suspiciousness evaluation model to evaluate the suspiciousness of a statement being faulty, and then leverages dynamic backward slicing to extract contextual information. The empirical results show that our approach significantly outperforms the state-of-the-art technique Dstar.

  • Multi-Channel Convolutional Neural Networks for Image Super-Resolution

    Shinya OHTANI  Yu KATO  Nobutaka KUROKI  Tetsuya HIROSE  Masahiro NUMA  

     
    PAPER-IMAGE PROCESSING

      Vol:
    E100-A No:2
      Page(s):
    572-580

    This paper proposes image super-resolution techniques with multi-channel convolutional neural networks. In the proposed method, output pixels are classified into K×K groups depending on their coordinates. Those groups are generated from separate channels of a convolutional neural network (CNN). Finally, they are synthesized into a K×K magnified image. This architecture can enlarge images directly without bicubic interpolation. Experimental results of 2×2, 3×3, and 4×4 magnifications have shown that the average PSNR for the proposed method is about 0.2dB higher than that for the conventional SRCNN.

121-140hit(149hit)